(improvement) Optimize VectorType deserialization with struct.unpack and numpy (us level improvements - 2-13x speedup - Python path only!) by mykaul · Pull Request #730 · scylladb/python-driver

mykaul · 2026-03-07T10:01:05Z

Summary

Replace element-by-element VectorType deserialization with bulk struct.unpack for known numeric types (float, double, int32, int64, short), caching a struct.Struct object at type-creation time
Add numpy fast-path (np.frombuffer().tolist()) for vectors with >= 32 elements
Cache serial_size() results to eliminate per-call method dispatch overhead
Fix exception handling in variable-size vector path: remove dead KeyError catch, wrap subtype.deserialize failures with element context and proper exception chaining

Performance (pure Python, best of 5)

Deserialization:

Vector Config	Master	PR #730	Speedup
`Vector<float, 4>`	1.12 us	0.22 us	5.1x
`Vector<float, 16>`	3.23 us	0.35 us	9.2x
`Vector<float, 128>`	23.46 us	1.91 us	12.3x
`Vector<float, 768>`	146.07 us	11.22 us	13.0x
`Vector<float, 1536>`	293.27 us	21.98 us	13.3x

Serialization:

Vector Config	Master	PR #730	Speedup
`Vector<float, 4>`	0.55 us	0.16 us	3.4x
`Vector<float, 16>`	1.67 us	0.24 us	7.0x
`Vector<float, 128>`	11.15 us	1.01 us	11.0x
`Vector<float, 768>`	62.53 us	5.12 us	12.2x
`Vector<float, 1536>`	123.69 us	10.82 us	11.4x

serial_size() overhead:

	Master	PR #730	Speedup
`serial_size()` call (768-dim)	104 ns	50 ns	2.1x

Details

Commit 1 -- struct.unpack optimization + variable-size path fixes:

At apply_parameters() time, cache a struct.Struct('>Nf') for the vector's subtype+dimension
deserialize() calls list(struct.unpack(byts)) -- single C-level bulk unpack
Also optimizes serialization via struct.pack(*v)
Fallback for non-numeric fixed-size types uses pre-allocated result list + cached method reference
Variable-size path: remove dead KeyError from except clause (uvint_unpack only raises IndexError), wrap subtype.deserialize failures in ValueError with element index and proper exception chaining (from e)

Commit 2 -- numpy for large vectors:

For vectors >= 32 elements with a known numeric dtype, use np.frombuffer(byts, dtype='>f4', count=N).tolist()
numpy avoids intermediate Python object creation during unpacking; .tolist() batch-converts with better cache locality
Threshold of 32 chosen empirically: below this, struct.unpack is faster due to lower fixed overhead
_numpy_dtype cached on the class at type-creation time (no per-call dict construction)

Commit 3 -- serial_size caching:

Cache subtype.serial_size() result as _subtype_serial_size and the full vector serial size as _serial_size during apply_parameters()
serial_size() returns cached value directly (no method dispatch chain)
serialize() and deserialize() use cls._subtype_serial_size instead of calling cls.subtype.serial_size() each time
Eliminates ~50ns overhead per serialize/deserialize call

All three commits modify only cassandra/cqltypes.py. No Cython dependency.

Copilot

Pull request overview

This PR optimizes VectorType (de)serialization in cassandra/cqltypes.py by introducing bulk numeric (de)serialization via a cached struct.Struct, and an optional numpy-based deserialization fast path for larger vectors.

Changes:

Cache a per-parameterized-vector struct.Struct to bulk unpack/pack common numeric vector subtypes.
Add an optional numpy frombuffer(...).tolist() deserialization fast-path for vectors with vector_size >= 32.
Refactor variable-size vector deserialization to a fixed-iteration loop with stricter bounds checks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ct.unpack Add bulk deserialization using struct.unpack for common numeric vector types instead of element-by-element deserialization. This provides significant performance improvements, especially for small vectors and integer types. Optimized types: - FloatType ('>Nf' format) - DoubleType ('>Nd' format) - Int32Type ('>Ni' format) - LongType ('>Nq' format) - ShortType ('>Nh' format) Performance improvements (measured with CASS_DRIVER_NO_CYTHON=1): Small vectors (3-4 elements): Vector<float, 3> : 0.88 μs → 0.25 μs (3.58x faster) Vector<float, 4> : 0.78 μs → 0.28 μs (2.79x faster) Medium vectors (128 elements): Vector<float, 128> : 4.72 μs → 4.06 μs (1.16x faster) Vector<double, 128> : 4.83 μs → 4.01 μs (1.20x faster) Vector<int, 128> : 2.27 μs → 1.25 μs (1.82x faster) Large vectors (384-1536 elements): Vector<float, 384> : 15.38 μs → 14.67 μs (1.05x faster) Vector<float, 768> : 32.43 μs → 30.72 μs (1.06x faster) Vector<float, 1536> : 63.74 μs → 63.24 μs (1.01x faster) The optimization is most effective for: - Small vectors (3-4 elements): 2.8-3.6x speedup - Integer vectors: 1.8x speedup - Medium-sized float/double vectors: 1.2-1.3x speedup For very large vectors (384+ elements), the benefit is minimal as the deserialization time is dominated by data copying rather than function call overhead. Variable-size subtypes and other numeric types continue to use the element-by-element fallback path. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

For vectors with 32 or more elements, use numpy.frombuffer() which provides 1.3-1.5x speedup for large vectors (128+ elements) compared to struct.unpack. The hybrid approach: - Small vectors (< 32 elements): struct.unpack (2.8-3.6x faster than baseline) - Large vectors (>= 32 elements): numpy.frombuffer().tolist() (1.3-1.5x faster than struct.unpack) Threshold of 32 elements balances code complexity with performance gains. Benchmark results: - float[128]: 2.15 μs → 1.87 μs (1.15x faster) - float[384]: 6.17 μs → 4.44 μs (1.39x faster) - float[768]: 12.25 μs → 8.45 μs (1.45x faster) - float[1536]: 24.44 μs → 15.77 μs (1.55x faster) Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

…ated method dispatch Cache subtype.serial_size() and the full vector serial_size() as class attributes (_subtype_serial_size, _serial_size) during apply_parameters(). This eliminates per-call method dispatch overhead in serialize(), deserialize(), and serial_size() hot paths. serial_size() call: 99ns -> 46ns (2.2x faster) Attribute access: 54ns -> 17ns (3.2x faster)

Lorak-mmk · 2026-05-20T10:32:15Z

@mykaul This is not a draft, but review was not requested. Please either change to draft, or request review.

mykaul · 2026-05-20T10:58:21Z

@mykaul This is not a draft, but review was not requested. Please either change to draft, or request review.

It's an improvement, not a fix. I believe it's ready, but I don't want to disrupt the team. I'm not sure what to do (and I do it for fun anyway). If there's anything that I see as important - I'm not shy.

mykaul marked this pull request as draft March 7, 2026 10:22

mykaul mentioned this pull request Mar 14, 2026

Tracking: Vector search (VectorType) performance improvement PRs #746

Open

mykaul requested a review from Copilot March 16, 2026 18:14

Copilot started reviewing on behalf of mykaul March 16, 2026 18:14 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

Comment thread cassandra/cqltypes.py

Comment thread cassandra/cqltypes.py

Comment thread cassandra/cqltypes.py

mykaul mentioned this pull request Apr 2, 2026

[DO NOT MERGE] (Improvement) improve performance of Vector type parsing #689

Draft

8 tasks

mykaul added 2 commits April 2, 2026 13:51

mykaul force-pushed the vector-struct-numpy-deser branch from c417e73 to 0535ecd Compare April 2, 2026 10:52

mykaul self-assigned this Apr 2, 2026

mykaul marked this pull request as ready for review April 2, 2026 12:22

mykaul changed the title ~~(improvement) Optimize VectorType deserialization with struct.unpack and numpy~~ (improvement) Optimize VectorType deserialization with struct.unpack and numpy (us level improvements - 2-13x speedup - Python path only) Apr 7, 2026

Lorak-mmk force-pushed the master branch from f2a9e87 to 763af09 Compare June 15, 2026 10:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(improvement) Optimize VectorType deserialization with struct.unpack and numpy (us level improvements - 2-13x speedup - Python path only!)#730

(improvement) Optimize VectorType deserialization with struct.unpack and numpy (us level improvements - 2-13x speedup - Python path only!)#730
mykaul wants to merge 3 commits into
scylladb:masterfrom
mykaul:vector-struct-numpy-deser

mykaul commented Mar 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Lorak-mmk commented May 20, 2026

Uh oh!

mykaul commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

mykaul commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance (pure Python, best of 5)

Details

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Lorak-mmk commented May 20, 2026

Uh oh!

mykaul commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mykaul commented Mar 7, 2026 •

edited

Loading